Forensic Science International: Genetics — Latest Matching Preprints

1

Improving the Accuracy of Forensic Age Estimation Through Bias Reduction

Flores, M.; Pellegrini, M.

2026-06-03 bioinformatics 10.64898/2026.05.30.728628 medRxiv

Top 0.1%

89.5%

Show abstract

Chronological age estimation can provide supporting information in forensic casework when traditional identification methods are limited. DNA methylation, a stable epigenetic mark, has emerged as a promising tool for predicting chronological age from trace samples. However, many existing age estimation models rely on linear regression approaches, which often yield biased prediction errors across the age distribution (i.e. model residuals show a significant age dependence). In this study, we compared three approaches for age estimation modeling: multivariable linear regression, random forest regression and maximum likelihood estimation. While the first two approaches are well established, for the third one we constructed and validated a DNA methylation-based LOESS regression maximum likelihood model for age estimation utilizing forensic-relevant CpG markers. In all cases, model performance was evaluated through Leave-One-Out Cross-Validation (LOOCV). We utilized three independent publicly accessible methylation datasets collected using droplet digital PCR (ddPCR) to evaluate the most effective method for accuracy and bias in age estimation. Notably, when we compare the results of the maximum likelihood approach to the other approaches, multivariable linear regression and random forest regression, we find less bias in the age associated residuals compared to the other methods. These findings highlight the utility of non-linear modeling techniques in reducing the biases of epigenetic age estimation for forensic applications.

2

Whole-genome sequencing of a mid-20th-century femur from central Israel in an open missing-person case

Vol, E.; Waldman, S.; Lomes, A.; Brielle, E. S.; Appel, N.; Dolin, B.; Asif, S.; Nagar, Y.; Marco, E.; Bergman, N.; Khaner, O.; Raviv, D.; Oliel, J.; Lewis, R. Y.; Carmi, S.

2026-04-28 genetics 10.64898/2026.04.24.720291 medRxiv

Top 0.1%

47.2%

Show abstract

Genome-wide technologies can generate investigative leads in cold cases by determining the genetic ancestry of the forensic sample. Increasingly, DNA extraction and whole-genome sequencing or genotyping are being used to analyze early or middle-20th century skeletal remains. Here, we present the first case, to our knowledge, of whole-genome sequencing of a middle-20th-century bone sample from the Middle East. A femur discovered in a cave in Central Israel was proposed to belong to a person of Ashkenazi Jewish ancestry who was missing since 1948. Following DNA extraction and single-stranded library preparation, whole-genome sequencing generated nearly 500 million reads. However, only 0.5% of the reads mapped to the human genome, providing depth of coverage of 0.07x. After quality control and male sex inference, ancestry assignment was performed using principal components and ADMIXTURE analyses. The results suggested that the genome definitively belonged to a person of Arab ancestry, refuting the hypothesis of an Ashkenazi Jewish origin.

3

Evaluating anonymized genome re-identification using polygenic predictions and its implications for data privacy

Cavinato, T.; Hofmeister, R. J.; Kutalik, Z.

2026-06-10 genetics 10.64898/2026.06.10.731306 medRxiv

Top 0.1%

11.9%

Show abstract

Re-identification by phenotypic prediction aims to determine whether a genome belongs to a specific individual by comparing the individuals known traits with those predicted from the genome. This type of tracing attack is widely discussed in the genomic privacy literature, yet previous studies have been criticized for overstating its practical risks. Over the past decade, genome-wide association studies (GWAS) with increasing sample size improved the accuracy of phenotypic prediction, potentially enhancing such attacks. To quantify their real-world threat, we developed a probabilistic framework that estimates the likelihood of a match between an individuals observed traits and polygenic scores (PGS) derived from a genome, while accounting for prediction accuracy and genetic and environmental correlations between the traits. We benchmarked this re-identification method and examined how the prior probability (reflecting the a priori chance that a random genome and set of traits correspond to the same person) affects performance. Finally, we assessed whether sensitive information could be inferred through this attack by attempting to predict multiple sensitive haplotypes, such as APOE-{varepsilon}4 (linked with Alzheimers disease). Our re-identification method outperformed a state-of-the-art tool, and reached a precision above 99% for a recall of 40% when considering a prior of 50%. However, after considering real-world settings, we estimated that realistic priors would not exceed 4 x 10-4%, resulting in a precision lower than 0.13% at the same recall (40%). The inference of sensitive genotypes also proved ineffective, as achieving a precision above 50% for identifying APOE-{varepsilon}4 carriers was only possible at a recall below 20%. To conclude, although re-identification by phenotypic prediction is technically feasible, our findings indicate that its effectiveness in real-world conditions is limited. These results counterpoint to earlier claims of severe genomic privacy risks and offer guidance for policymakers, biobank administrators, and research participants.

4

AssayBLAST v2: Major update improving reliability and reporting of the in silico analysis of molecular multi-parameter assays

Eulenfeld, T.; Collatz, M.; Braun, S. D.; Ehricht, R.

2026-04-29 bioinformatics 10.64898/2026.04.27.721032 medRxiv

Top 0.1%

5.5%

Show abstract

IntroductionAccurate in silico evaluation of primers and probes is essential for the rational design of molecular multi-parameter assays. We present Assay-BLAST v2 to automate and simplify this process for extensive assay designs. ResultsA newly integrated strand and proximity check enables precise validation of corresponding oligonucleotides, ensuring correct orientation and spacing for efficient amplification. Based on predicted oligonucleotide interactions, Assay-BLAST v2 estimates amplification outcomes, offering a computational benchmark for downstream wet-lab validation and performance correlation. Additionally, the updated software integrates an adaptive BLAST parameter optimization that dynamically scales with database size, thereby improving both analytical sensitivity and computational performance. These improvements are supported by a comparative evaluation against the previous version of AssayBLAST. ConclusionsCollectively, these enhancements streamline the assay development workflow, reduce costs associated with suboptimal primer and probe synthesis, and increase the robustness and reliability of molecular diagnostics and research applications.

5

Shark sexing from forensic, archival, and developmental samples using sex-linked DNA markers

Akane, O.; Kawaguchi, Y. W.; Niwa, T.; Uno, Y.; Kuraku, S.

2026-05-06 ecology 10.64898/2026.05.02.722412 medRxiv

Top 0.1%

4.4%

Show abstract

The effective management of threatened shark populations relies on accurate demographic data, particularly operational sex ratios. While sex identification in intact shark bodies is straightforward through the presence of external male organs, namely claspers, it remains impossible for processed fins in the illegal wildlife trade, early-stage embryos in breeding programs, or archived tissue fragments and blood samples where morphological traits are lost. Here, we present a robust molecular sexing framework leveraging recently identified sequences from shark sex chromosomes, consistently organized in the XY system, to our current knowledge. Our approach consists of two distinct methodologies tailored to the the current identification status of sex chromosome sequences in the target species. For the whale shark Rhincodon typus and the brownbanded bamboo shark Chiloscyllium punctatum, we employed end-point PCR assays targeting male-specific Y-linked markers. For the cloudy catshark Scyliorhinus torazame, we developed a quantitative PCR (qPCR) assay targeting differential X chromosome dosage. In this dosage-based system, females (XX) are distinguished by an amplification profile approximately one cycle earlier than males (XY). By integrating X-linked dosage quantification, our framework provides a critical internal control that significantly enhances reliability, allowing researchers to distinguish true females from PCR failures. This toolkit offers a versatile solution for diverse applications, ranging from the study of sex determination mechanisms in pre-phenotypic embryos to the reconstruction of sex ratios from space-constrained tissue archives and global wildlife forensics, thereby contributing to the comprehensive conservation of shark biodiversity.

6

Impact Of Fluorescent Dyes On Mutations In Next Generation Sequencing Lirbary Preparation

Butty, V.; Patel, P.; Levine, S. S.

2026-04-29 molecular biology 10.64898/2026.04.26.720908 medRxiv

Top 0.1%

4.1%

Show abstract

DNA labelling fluorescent dyes such as ethidium bromide have long been considered to be highly mutagenic during DNA replication. While recent studies have pushed back on this narrative, the intercalative nature of these dyes continues to raise the possibility that these dyes can induce mutations. The iconPCR instrument by n6tec uses fluorescent dyes to measure amplification in real time and to adjust cycling conditions. However, since this use of qPCR is preparative and not analytical, mutations introduced by fluorescent dyes would be propagated into the sequencing reaction. To address the impact of these dyes on downstream analyses, we have performed routine mutation calling as well as mutational signature analysis on samples amplified using the iconPCR in the presence of either SYBR or EvaGreen. Sequence analysis revealed very minimal impacts of dyes on the reactions, largely within the noise regimen with only subtle changes in mutation rates seen. Mutational signature analysis was unable to identify any key signatures assignable to the dyes in either substitutions or indel domains. The mutational impact of intercalating dyes during fluorescence-guided amplification is therefore minimal and can be disregarded in all but the most sensitive NGS applications.

7

Cultural affiliation accounts for most of the spatiotemporal variation in burial rite practices

Canteri, E.; Staniuk, R.; Timpson, A.; Schauer, P.; Bulatovic, J.; Ivanova-Bieg, M.; Reiter, S. S.; Rose, H. A.; Kolar, J.; Thomas, M. G.; Racimo, F.; Shennan, S.

2026-05-28 genetics 10.64898/2026.05.25.725982 medRxiv

Top 0.1%

2.4%

Show abstract

Describing and interpreting spatiotemporal patterns in human culture has been a central focus of anthropology and archaeology for over a century. Recent ethnographic studies have highlighted the complexity of the processes generating these patterns, including isolation-by-distance, homophily, and common descent. However, investigating these processes in prehistoric archaeology remains challenging. Here we make use of a new interdisciplinary database and a combined dataset of ancient DNA (aDNA) genomic sequences to analyse the relationship between spatiotemporal patterns in cultural and genomic variation, by testing whether broadly defined clusters of genomic affinities correspond to spatiotemporal changes in burial rites, while controlling for other factors, using a Gaussian process model. We use data from the Big Interdisciplinary Archaeological Database (BIAD), linking mortuary information from [~]4,200 individuals with genetic ancestry and mobility data inferred from over 1,300 human genomes, from Western Eurasia [~]10,000-2000 BP. By integrating and modelling these diverse datasets, we aim to provide a detailed understanding of how genomic history intersects with cultural evolution, offering new insights into the dynamics behind these complex processes, and the extent to which genes and culture are transmitted in parallel. In the case of burial orientation, we found that cultural affiliation was the main factor accounting for variation with little to no role for ancestry, while for body position the picture was more mixed but cultural affiliation also played an important role.

8

Multisite Evaluation of an Amplification-based Nanopore Sequencing Solution to Analyze Challenging Clinically Relevant Variants in Genes Associated with Hereditary Diseases

Filipovic-Sadic, S.; Parker, C. A.; Mihailovic, M. K.; Milligan, J. N.; Turner, J. M.; Borel, S. L.; Le, V.; Markulin, T.; Janovsky, J. W.; Killinger, B. J.; Deshotel, M. J.; Reading, N. S.; Fredrickson, E. K.; Ji, Y.; Close, D.; Wright, J.; Williams, M.; Barrie, E. S.; Martin, K. E.; Gray, S. M.; Haynes, B. C.; Hall, B.

2026-05-19 genetics 10.64898/2026.05.14.725224 medRxiv

Top 0.1%

1.7%

Show abstract

PurposeCarrier screening for hereditary conditions is challenged by genes with complex genomic architecture, where short-read sequencing can fail to detect clinically relevant variants. This study evaluated a unified, amplification-based nanopore sequencing workflow across multiple laboratories for comprehensive analysis of such loci. MethodsA modular long-read sequencing assay was evaluated across five laboratories using targeted PCR enrichment, Oxford Nanopore sequencing, and automated variant analysis. The workflow interrogated genes associated with spinal muscular atrophy, thalassemia, cystic fibrosis, fragile X syndrome, congenital adrenal hyperplasia, Gaucher disease, and hemophilia A. Performance was assessed against orthogonal methods for single nucleotide variants (SNVs), indels, copy-number variants, repeat expansions, and structural rearrangements. ResultsAcross 882 unique samples (1,266 tests), overall agreement with comparator methods exceeded 96% for variant-level detection and 97% for genotype status classification. Long-read sequencing enabled phasing of paralogous loci, integrated sizing and interruption analysis for FMR1 repeats, and simultaneous detection of SNVs and structural variants in globin loci and CYP21A2-TNXB region, reducing reliance on multiple workflows. ConclusionThis multisite evaluation suggests that targeted long-read sequencing can consolidate complex variant detection into a single workflow, improving analytical completeness and operational efficiency for carrier screening.

9

Simulation-based Bayesian deep learning enables uncertainty-aware tumor fraction estimation in cell-free DNA

Volkov, H.; Raitses-Gurevich, M.; Grad, M.; Shlayem, R.; Danilevsky, A.; Rubinek, T.; Gorfine, M.; Shomron, N.

2026-06-19 bioinformatics 10.64898/2026.06.15.732265 medRxiv

Top 0.1%

1.7%

Show abstract

BackgroundEstimating tumor fraction from whole-genome cell-free DNA sequencing is critical for liquid biopsy, but is hampered by weak signals and baseline noise at low tumor fractions. Existing computational methods often require matched controls or large labeled datasets for training and lack uncertainty quantification. To address these gaps, we developed purNPE, a Bayesian deep-learning framework trained without labeled cancer cell-free DNA samples. Specifically, purNPE leverages a two-part generative model: one component simulates diverse tumor copy-number profiles based on evolutionary genealogies, while a second, data-driven component learns and replicates realistic sequencing background patterns from cancer-free cell-free DNA. By training a Neural Posterior Estimator on synthetic tumor profiles augmented with learned noise, purNPE performs amortized inference in milliseconds without needing a reference sample set at inference. ResultsIn a real-world pan-cancer cohort, purNPE achieved comparable performance with existing methods against orthogonal mutant-allele-fraction validation (MAE = 0.066). In silico and semi-synthetic experiments suggested analytical sensitivity around 1% tumor fraction under the evaluated conditions and showed strong classification accuracy in low tumor fractions (AUC = 0.98 for TF[≤] 3% versus controls). ConclusionsThis work provides a framework for using simulation-based inference to derive calibrated, uncertainty-aware TF estimates, offering a potential alternative to traditional data-dependent methods.

10

Early-life dentine based elemental biodynamics and cord blood telomere length

Srinath, B.; Ravisekar, R.; Sachdev, K.; Eggers, J.; Torres Olascoaga, L. A.; McRae, N.; Lopez, I.; DeBolt, C. A.; Akinkugbe, A.; Ranchadiya, R.; Tellez-Rojo, M. M.; Gennings, C.; Wallace, R. B.; Wright, R.; Wright, R. J.; Arora, M.; Alcala, C. S.; Agrawal, M.; Lane, J. M.; Rosa, M. J.; Eggers, S. I.; Midya, V.

2026-05-01 epidemiology 10.64898/2026.04.30.26351974 medRxiv

Top 0.1%

1.6%

Show abstract

BackgroundLeukocyte telomere length (LTL) from cord blood is a marker of biological aging and long-term systemic health. Exposure to essential and toxic metals has been shown to influence LTL in a sexually dimorphic manner. However, little is known about the interplay between early-life longitudinal biodynamic patterns of these elements and cord blood LTL, as well as potential sex differences. MethodsFrom an ongoing longitudinal birth cohort study in Mexico City, we used available tooth samples from 231 children (129 males and 102 females) to generate 16 elemental weekly time series of direct fetal intensities from the second trimester through four to five months after birth. We analyzed the dentine growth rings using Inductively Coupled Plasma Mass Spectrometry to generate time-resolved elemental intensities. The elements included were Li, Mg, Ca, Mn, Co, Ni, Cu, Zn, As, Sr, Mo, Cd, Sn, Ba, Pb, and Bi. LTL was measured in cord blood using qPCR. We used cross-recurrence quantification analysis and entropy-complexity-based measures to generate time-resolved features that quantify the synchronization of elemental biodynamics. A stability-selection approach using five-fold cross-validation of regularized ridge regression was used for feature selection, and covariate-adjusted linear models were used to estimate associations with LTL. FindingsThe biodynamic interaction of Mg-Co and Mn-Sn was identified as the most stable feature among male and female children, respectively. In males, higher vertical entropy (i.e., a measure of higher variability) of Mg-Co temporal biodynamics was associated with shorter LTL ({beta}[95%CI]: -0.9[-0.14,-0.03]; p-value<0.01), but not in females ({beta}[95%CI]:-0.02[-0.10,0.06]; p-value=0.60); whereas higher recurrence rate (i.e., a measure of higher synchronicity) of Mn-Sn temporal biodynamics was associated with longer LTL ({beta}[95%CI]: 0.09[0.02,0.16]; p-value=0.01), in females but not in males ({beta}[95%CI], 0.03[-0.04, 0.09]; p-value=0.39). InterpretationWe demonstrate that time-varying multi-elemental synchronization of early-life elemental biodynamics, a potential marker of homeostatic balance, may be associated with cord blood-based telomere length in a sexual dimorphic manner.

11

Molecular Clock Dating of Ancient Environmental DNA Reveals Damage Beyond Deamination

Lemmon-Kishi, M.; Pipes, L.; De Sanctis, B.; Nielsen, R.

2026-07-07 bioinformatics 10.64898/2026.07.03.735781 medRxiv

Top 0.1%

1.5%

Show abstract

Ancient environmental DNA (aeDNA) from permafrost, lake, cave, and marine sediments provides a rich source of genetic data that captures broad perspectives of past biodiversity. Accurate dating is crucial for discovering ecologically relevant patterns from aeDNA, and molecular clock dating would allow for sample ages to be estimated from the recovered genetic material itself instead of the geological components. However, the fragmented and damaged nature of short-read ancient DNA (aDNA) from multiple taxonomic sources poses significant challenges and has limited this dating approach for aeDNA. Here we developed ratePlacer, a phylogeny-based method for analyzing aeDNA that can combine information from many short reads in a sample while accounting for DNA damage to provide maximum likelihood estimates of sample ages. Simulations demonstrate that ratePlacer accurately dates samples even under the fragmented, damaged conditions characteristic of aeDNA and outperforms Bayesian tip-dating approaches for taxonomically mixed samples commonly found in aeDNA. Yet age estimates from re-dating Kap Kobenhavn varied across taxa, highlighting the difficulty of molecular clock dating in aeDNA. This dating also revealed elevated G[->]T and C[->]A mismatches consistent with oxidative damage. These patterns reveal aDNA damage beyond deamination and that remains understudied, suggesting that aeDNA should be carefully evaluated in genomic and evolutionary analyses. The new dating method, ratePlacer, extends molecular clock dating of aDNA from single-specimen to pooled environmental DNA data, where traditional methods struggle.

12

Towards a Robust cell-free DNA Isolation Protocol for NGS Applications in a Clinical Molecular Diagnostics Setting

Apweiler, M.; Broche, J.; Loitz, M.; Hackenbruch, L.; Ossowski, S.; Schroeder, C.; Schmit, K. J.

2026-06-24 health systems and quality improvement 10.64898/2026.06.15.26355337 medRxiv

Top 0.1%

1.4%

Show abstract

Cell-free DNA (cfDNA), released from apoptotic and necrotic cells into body fluids, represents a non-invasive source of genetic information for disease prediction, diagnosis, and monitoring. However, its low physiological abundance makes cfDNA highly susceptible to pre-analytical influences. In particular, genomic DNA (gDNA) released from lysed white blood cells (WBCs) can contaminate plasma and compromise downstream cfDNA analyses. This study evaluated the impact of different blood collection tubes and isolation methods on cfDNA stability and yield. Blood samples from 13 healthy donors were collected using cfDNA-stabilizing tubes (Cell-Free DNA BCT, Streck; S-Monovette cfDNA Exact, Sarstedt) and stored at room temperature for 1, 5, or 10 days before plasma isolation. CfDNA was extracted using either a magnetic bead-based method or a silica column-based approach. DNA quantity and quality were assessed by fluorometric quantification, automated fragment analysis, and gene-specific quantitative PCR. Streck-based workflows maintained stable cfDNA yields and characteristic mononucleosomal fragmentation profiles across all storage times. In contrast, Sarstedt tubes showed reduced cfDNA concentrations after 5 days and a pronounced increase at 10 Days, accompanied by high-molecular weight DNA patterns consistent with WBC lysis. These trends were largely independent of the extraction method. Overall, the results demonstrate that blood collection tube chemistry critically influences cfDNA integrity during delayed processing. Streck tubes, particularly when combined with QIAamp, provided the most robust and reproducible workflow for routine molecular diagnostics, whereas Sarstedt tubes produced physiologically implausible results after extended storage.

13

AutopsyPrint: A novel tool for translating ballistic and sharp force injury trajectory findings into 3D printable models

Parsons, C. E.; Thomsen, A. H.; Petersen, M. V.

2026-07-13 forensic medicine 10.64898/2026.07.10.26357738 medRxiv

Top 0.1%

1.3%

Show abstract

When autopsy findings are presented on two-dimensional paper-based models, there is an inherent reduction of spatial information, which the viewer must infer from simplified anatomical and geometrical representations. Multiple diagrams representing trajectory angles must be integrated into a complete mental model, introducing potential for errors in viewer understanding. 3D models can address these issues but have shown limited adoption in forensic autopsy reporting given the technical competences and software required to produce them. Here, we present AutopsyPrint, a workflow and open web-based tool for generating 3D printable body models annotated with wound trajectories. The tool supports marking different wound types, including ballistic and stab wounds, on male and female bodies, which can be posed to accommodate a diversity of trajectories. Based on our testing, we present a set of suggested workflow steps and parameters based to facilitate standardization of the 3D models produced, balancing between precision and print time and materials. To ensure accessibility, the tool runs fully in the user's browser, and all annotated data is stored locally. By making AutopsyPrint open access, we intend to build practical experience with model creation, to ultimately advance the use of 3D models in the field.

14

Palaeoproteomic deconvolution of physical and genetic collagen mixtures

Engels, I.; Dedrie, T.; Saugen, S. M.; Van de Vyver, S.; Vandenbroucke, T.; Di Modica, K.; Decher, J.; Toso, A.; Deforce, D.; Daled, S.; Burnett, A.; Abrams, G.; Dhaenens, M.

2026-06-18 biochemistry 10.64898/2026.06.17.732552 medRxiv

Top 0.1%

1.1%

Show abstract

Species identification in palaeoproteomics relies on genome-derived protein sequences which are often poor-quality, and lacks tools to cope with multi-species samples. Here, we address both challenges through the analysis of physical and genetic mixtures. Species that are absent from our database are considered a genetic mixture, i.e. a patchwork of peptides from closely related species. Inversely, various overlapping peptide stretches allow us to resolve complex physical mixtures. This is benchmarked by analysing physical mixtures of modern bone fragments, including genetic mixtures. We illustrate the impact of our approach via a rapid and high-throughput analysis of >2500 bone fragments, revealing the Eemian-era faunal environment around Scladina Cave, including the first Palaeoloxodon antiquus identified at this site. O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=182 SRC="FIGDIR/small/732552v1_ufig1.gif" ALT="Figure 1"> View larger version (67K): org.highwire.dtl.DTLVardef@75c1d3org.highwire.dtl.DTLVardef@1084481org.highwire.dtl.DTLVardef@1c9a4f9org.highwire.dtl.DTLVardef@16dd859_HPS_FORMAT_FIGEXP M_FIG C_FIG

15

SexPeptID: an automated and reproducible workflow for paleoproteomics sex estimation in archaeological enamel

Morvan, M.

2026-06-09 evolutionary biology 10.64898/2026.06.05.730301 medRxiv

Top 0.1%

1.0%

Show abstract

Accurate biological sex estimation is a key objective in archaeological and bioanthropological research but remains challenging when skeletal remains are fragmented, juvenile, or poorly preserved. Paleoproteomics approaches based on the detection of sex-specific amelogenin peptides (AMELX/AMELY) have emerged as a powerful alternative to osteological and genetic methods. However, current workflows often lack standardized criteria for peptide-level confidence assessment, potentially affecting the reproducibility and reliability of sex assignments. In this study, I evaluated the impact of peptide-level confidence filtering on paleoproteomics-based sex estimation through the reanalysis of 164 Homo sapiens individuals from 10 published datasets and 26 Bos taurus individuals from 3 datasets, spanning contexts from the Pleistocene to the present. To address methodological inconsistencies, I developed SexPeptID, an R/Shiny-based framework that integrates Posterior Error Probability (PEP) filtering, standardized peptide selection, and explicit uncertainty assessment. Application of SexPeptID revealed that peptide-level filtering substantially affects sex assignment outcomes: 17 previously classified males (10.4%) were reclassified as non-conclusive, while 5 individuals (3.1%) were identified as potentially female. Despite this sensitivity, AMELX/AMELY-based sex estimation remained robust overall, with stable signal ratios observed across archaeological periods. Variability in peptide intensities was primarily associated with dataset-specific factors rather than temporal differences, highlighting the influence of analytical workflows and preservation conditions. By incorporating confidence-based filtering and a non-conclusive classification category, SexPeptID improves the transparency, reproducibility, and reliability of palaeoproteomics sex estimation, providing a standardized framework for future archaeological and bioanthropological studies. HighlightsO_LISexPeptID provides a reproducible framework for amelogenin-based sex estimation. C_LIO_LIPeptide-level confidence filtering significantly affects paleoproteomics sex estimates. C_LIO_LI13.4% of published male assignments were revised after confidence filtering. C_LIO_LIAMELX/AMELY ratios show temporal stability from modern to Pleistocene samples. C_LIO_LIStandardized uncertainty assessment strengthens palaeoproteomics inference. C_LI

16

Conditional and marginal SNP-heritability to leverage ancestral and environmental diversity

Singh Sachan, A. N.; Schwartzman, A.; Azriel, D.

2026-05-29 genetics 10.64898/2026.05.28.728536 medRxiv

Top 0.1%

0.9%

Show abstract

SNP-heritability is defined as the fraction of variance of a trait that is explained by the SNPs in a genome-wide association study. Several methodologies have been proposed to estimate this quantity. More recent methods aim to do so with ancestrally diverse datasets and yet obtain a single heritability for an entire dataset, which we refer to as marginal heritability. However, the different underlying subpopulations that compose a genetically diverse dataset might have different environmental and genetic exposures, and thus may have different heritabilities. In order to address this, we propose a conditional SNP-heritability approach that allows to estimate multiple SNP-heritabilities on a dataset corresponding to different ancestral compositions and environmental exposures. We take a careful statistical approach, including estimation of conditional genetic and environmental variances, and calculation of standard errors via a combination of the delta method with bootstrapping. We validate our method via extensive simulations. We then apply it to an ancestrally and socio-economically diverse dataset of 6603 subjects aged around 9 to 11 from the Adolescent Brain Cognitive Development study, and illustrate how the SNP-heritability of intelligence scores can change due to differing extrinsic variances in different socio-economic groups, which coincides with previous work in the literature. This conditional estimation approach can be a valuable tool for understanding differences in risks across subpopulations. Our work here improves on existing methodology and allows us to leverage the heterogeneity of the data to obtain new insights.

17

Multi-feature Classification to Improve Colorimetric Loop-Mediated Isothermal Amplification Fidelity

Melton, G.; Negron, D. A.; Hauser, K.; Jagannathan, S.; Tolli, N.; Jennings, K.; Necciai, B.; Sozhamannan, S.; Abramson, B.

2026-06-08 bioinformatics 10.64898/2026.06.03.728514 medRxiv

Top 0.1%

0.6%

Show abstract

Loop-mediated isothermal amplification (LAMP) is a cost-effective and portable assay technique for performing nucleic acid-based diagnostics in the field whose adoption is hindered by design and reproducibility issues. This is due to a complex primer design process that fine-tunes parameters across 6-8 binding regions. The likelihood of assay success depends on satisfying thermodynamic and secondary structure constraints while maintaining target specificity and avoiding overlaps between multiple primers. Software such as the NEB(R) LAMP Primer Design Tool, PREMIER Biosoft LAMP Designer, Primer3, PCR Signature Erosion Tool (PSET), and PrimerExplorer enable automation of this task for researchers. However, in our experience, these programs can sometimes yield inconsistent results in laboratory testing. Here, we approached the issue by comparing and training multiple machine learning (ML) models on primer sets targeting various organisms from working assays and failing ones to determine significant features and improve predictions prior to ordering primer sets. A literature review produced an initial list of primer sets (n=116), which were then filtered down based on reference template availability to discern their FIP/BIP components (F2/F1c and B1c/B2). The final training set (n=109) included sequence and thermodynamic features derived from primers collected from the review (n=74) and those designed in-house with PSET (n=35). Failing assays were difficult to obtain from the publications, so we provided our own (n=23). Using WEKA Experimenter, models were created based on decision tree and Bayesian learning algorithms using an experimental scheme that performed a parameter grid search, seeded replicates, feature selection, and cross-validation while avoiding data-leakage and outputting logs for model comparison, feature analysis, and overfit assessment. Notably, thermodynamic features associated with the F1c and B1c primers consistently appeared in the top ranks according to consensus between information gain, class-correlation, and model-based feature ranking. For classification, the NaiveBayes algorithm had a TP and TN rate of 0.90 ({+/-} 0.02) and 0.73 ({+/-} 0.05) while achieving Cohens kappa coefficient and F-score values of 0.61 ({+/-} 0.06) and 0.91 ({+/-} 0.01). This work highlights how a practical model was built from a small, imbalanced training set incorporating negative research results, of which more are needed to improve generalization and refine parameters critical to assay success.

18

Material-specific quarantine durations for SARS-CoV-2 inactivation on musical instruments and music-related materials

Pastorino, B.; Touret, F.; Creton, M.; Viala, R.; Morand, J. C.; Reyre, F.; Jousserand, M.; Billecard, F.; Charrel, R. N. C.

2026-07-01 microbiology 10.64898/2026.07.01.735763 medRxiv

Top 0.1%

0.5%

Show abstract

The COVID-19 pandemic has imposed a reevaluation of safety protocols across various sectors, including the arts. This study addresses a critical gap in understanding SARS-CoV-2 persistence on materials commonly associated with musical instruments and scores, such as alloys, varnishes, reeds, and paper. While previous research has explored viral survival on various surfaces, limited data exists for materials specific to musical contexts. In this work, we investigate the efficacy of quarantine as a non-destructive method for inactivating SARS-CoV-2 on 16 materials, including brass, silver plating, ABS plastic, ebonite, and various varnishes and paper types. Results revealed significant variability in viral persistence across materials. Non-porous surfaces like metals and ABS plastic cleared infectivity within 3 days, while porous materials such as reeds and music scores required up to 7 days. Gold-plated brass and certain varnishes showed intermediate persistence, with infectivity clearing after 4 days. These findings are in agreement with prior studies indicating that SARS-CoV-2 survival is highly dependent on surface composition, with porous and organic-coated materials retaining viable virus longer due to reduced environmental stress. Our results highlight the feasibility of stratified quarantine protocols based on material type, offering practical guidelines for musicians and institutions and provides critical insights for mitigating SARS-CoV-2 transmission risks in musical settings.

19

Directional Gene-Level Concordance and Methodological Constraints in Blood Transcriptomic and DNA Methylation Studies of Parkinson's Disease

Kaur, R.; Dewan, C.; Chauhan, I.; Sharma, K.; Sharma, S.

2026-05-20 neuroscience 10.64898/2026.05.17.725808 medRxiv

Top 0.1%

0.5%

Show abstract

Assessing reproducibility across different molecular profiling studies is a persistent methodological challenge (Zhang et al., 2009; Sweeney et al., 2017; Ioannidis, 2005). Differences in platform technology, cohort composition, analytical pipelines, and feature definitions often make it difficult to interpret cross-study comparisons based solely on gene-identity overlap. In this study, we conducted a retrospective computational analysis of seven publicly available analytical datasets (including alternative analytical pipelines applied to the same cohort) derived from five biologically independent peripheral blood transcriptomic and DNA methylation cohorts, comprising 3,487 samples (1,824 Parkinsons disease cases and 1,663 controls). Reproducibility was evaluated using gene-identity overlap, enrichment-based comparisons, and a permutation-based framework to assess directional consistency of effect estimates across datasets. We also tested the robustness of results by varying false discovery rate thresholds and applying alternative probe-to-gene collapsing strategies. All analyses were performed using reproducible workflows implemented in R and Python with fixed random seeds. Across independent cohorts, gene-identity overlap was generally limited, with enrichment ratios close to one, especially when datasets were generated using different platforms. In several datasets, limited numbers of statistically significant features further constrained overlap-based comparisons. In contrast, directional consistency showed greater stability. High levels of directional consistency were observed across independent cohort comparisons when restricted to overlapping statistically significant features and remained stable across statistical thresholds (90.0% at FDR < 0.05 and 82.8% at FDR < 0.10). When evaluated across the full shared gene universe without conditioning on statistical significance, directional consistency was substantially lower ([~]30 to 32%) but remained significantly above permutation-based null expectations. Permutation testing confirmed that the observed directional consistency exceeded what would be expected by chance. A combined analysis including methodological replicates (n [≥] 3 datasets) showed 98.3% directional consistency; however, this estimate includes non-independent analytical pipelines applied to the same cohort and reflects analytical stability rather than independent biological replication. Rather than introducing a new statistical method, this study examines how commonly used reproducibility metrics behave under crossstudy heterogeneity and identifies their practical limitations and appropriate use boundaries.

20

Multi-platform reassessment of human mitochondrial DNA methylation reveals signals consistent with technical artifacts

Basrai, S.; Bahcheli, A. T.; Tan, D.; Zuzarte, P. C.; Bevan, A.; Chan, T.; Ng, K.; Lam, B.; Arruda, A.; Das, S.; Minden, M. D.; Simpson, J. T.; Reimand, J.; Abelson, S.

2026-06-15 bioinformatics 10.64898/2026.06.10.730935 medRxiv

Top 0.1%

0.5%

Show abstract

The existence and functional relevance of mitochondrial DNA methylation remain controversial. Here, we systematically profiled cytosine methylation and hydroxymethylation across human brain and blood tissues spanning healthy and malignant states using orthogonal sequencing approaches that avoid chemical conversion during library preparation. While nuclear DNA exhibited canonical methylation patterns, mitochondrial DNA consistently showed negligible signal, indistinguishable from background technical noise. By mapping cytosine-guanine sites between mitochondrial DNA and nuclear-embedded mitochondrial sequences, we demonstrate the potential of these nuclear counterparts to confound not only cytosine methylation but also hydroxymethylation measurements, corroborating and extending prior findings implicating nuclear contamination as a potential source of apparent mitochondrial epigenetic signals. Additional technical factors that inflate apparent mtDNA methylation signals were identified, including sequence context biases, flow cell chemistries, and coverage-dependent discrepancies between the heavy and light strands. Collectively, these results provide convergent evidence against the presence of biologically meaningful cytosine methylation or hydroxymethylation in mitochondrial DNA. These findings caution against interpreting apparent mtDNA methylation signals in human adult tissues as meaningful without rigorous orthogonal validation and comprehensive consideration of technical and analytical confounding factors.